{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://cognitiveclass.ai\"><img src = \"https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png\" width = 300, align = \"center\"></a>\n",
    "\n",
    "<h1 align=center><font size = 5>Lab: Analyzing a real world data-set with SQL and Python</font></h1>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction\n",
    "\n",
    "This notebook shows how to store a dataset into a database using and analyze data using SQL and Python. In this lab you will:\n",
    "1. Understand a dataset of selected socioeconomic indicators in Chicago\n",
    "1. Learn how to store data in an Db2 database on IBM Cloud instance\n",
    "1. Solve example problems to practice your SQL skills "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Selected Socioeconomic Indicators in Chicago\n",
    "\n",
    "The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal.\n",
    "This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.\n",
    "\n",
    "Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.\n",
    "\n",
    "A detailed description of the dataset can be found on [the city of Chicago's website](\n",
    "https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2), but to summarize, the dataset has the following variables:\n",
    "\n",
    "* **Community Area Number** (`ca`): Used to uniquely identify each row of the dataset\n",
    "\n",
    "* **Community Area Name** (`community_area_name`): The name of the region in the city of Chicago \n",
    "\n",
    "* **Percent of Housing Crowded** (`percent_of_housing_crowded`): Percent of occupied housing units with more than one person per room\n",
    "\n",
    "* **Percent Households Below Poverty** (`percent_households_below_poverty`): Percent of households living below the federal poverty line\n",
    "\n",
    "* **Percent Aged 16+ Unemployed** (`percent_aged_16_unemployed`): Percent of persons over the age of 16 years that are unemployed\n",
    "\n",
    "* **Percent Aged 25+ without High School Diploma** (`percent_aged_25_without_high_school_diploma`): Percent of persons over the age of 25 years without a high school education\n",
    "\n",
    "* **Percent Aged Under** 18 or Over 64:Percent of population under 18 or over 64 years of age (`percent_aged_under_18_or_over_64`): (ie. dependents)\n",
    "\n",
    "* **Per Capita Income** (`per_capita_income_`): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population\n",
    "\n",
    "* **Hardship Index** (`hardship_index`): Score that incorporates each of the six selected socioeconomic indicators\n",
    "\n",
    "In this Lab, we'll take a look at the variables in the socioeconomic indicators dataset and do some basic analysis with Python.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Connect to the database\n",
    "Let us first load the SQL extension and establish a connection with the database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext sql"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Connected: rzb85234@BLUDB'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Remember the connection string is of the format:\n",
    "# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name\n",
    "# Enter the connection string for your Db2 on Cloud database instance below\n",
    "# i.e. copy after db2:// from the URI string in Service Credentials of your Db2 instance. Remove the double quotes at the end.\n",
    "%sql ibm_db_sa://rzb85234:xvcq%405d2wqwc2l7q@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Store the dataset in a Table\n",
    "##### In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. To analyze the data using SQL, it first needs to be stored in the database.\n",
    "\n",
    "##### We will first read the dataset source .CSV from the internet into pandas dataframe\n",
    "\n",
    "##### Then we need to create a table in our Db2 database to store the dataset. The PERSIST command in SQL \"magic\" simplifies the process of table creation and writing the data from a `pandas` dataframe into the table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'Persisted chicago_socioeconomic_data'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas\n",
    "chicago_socioeconomic_data = pandas.read_csv('https://data.cityofchicago.org/resource/jcxq-k9xf.csv')\n",
    "%sql PERSIST chicago_socioeconomic_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### You can verify that the table creation was successful by making a basic query like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>index</th>\n",
       "        <th>ca</th>\n",
       "        <th>community_area_name</th>\n",
       "        <th>percent_of_housing_crowded</th>\n",
       "        <th>percent_households_below_poverty</th>\n",
       "        <th>percent_aged_16_unemployed</th>\n",
       "        <th>percent_aged_25_without_high_school_diploma</th>\n",
       "        <th>percent_aged_under_18_or_over_64</th>\n",
       "        <th>per_capita_income_</th>\n",
       "        <th>hardship_index</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>0</td>\n",
       "        <td>1.0</td>\n",
       "        <td>Rogers Park</td>\n",
       "        <td>7.7</td>\n",
       "        <td>23.6</td>\n",
       "        <td>8.7</td>\n",
       "        <td>18.2</td>\n",
       "        <td>27.5</td>\n",
       "        <td>23939</td>\n",
       "        <td>39.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>1</td>\n",
       "        <td>2.0</td>\n",
       "        <td>West Ridge</td>\n",
       "        <td>7.8</td>\n",
       "        <td>17.2</td>\n",
       "        <td>8.8</td>\n",
       "        <td>20.8</td>\n",
       "        <td>38.5</td>\n",
       "        <td>23040</td>\n",
       "        <td>46.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>2</td>\n",
       "        <td>3.0</td>\n",
       "        <td>Uptown</td>\n",
       "        <td>3.8</td>\n",
       "        <td>24.0</td>\n",
       "        <td>8.9</td>\n",
       "        <td>11.8</td>\n",
       "        <td>22.2</td>\n",
       "        <td>35787</td>\n",
       "        <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>3</td>\n",
       "        <td>4.0</td>\n",
       "        <td>Lincoln Square</td>\n",
       "        <td>3.4</td>\n",
       "        <td>10.9</td>\n",
       "        <td>8.2</td>\n",
       "        <td>13.4</td>\n",
       "        <td>25.5</td>\n",
       "        <td>37524</td>\n",
       "        <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>4</td>\n",
       "        <td>5.0</td>\n",
       "        <td>North Center</td>\n",
       "        <td>0.3</td>\n",
       "        <td>7.5</td>\n",
       "        <td>5.2</td>\n",
       "        <td>4.5</td>\n",
       "        <td>26.2</td>\n",
       "        <td>57123</td>\n",
       "        <td>6.0</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(0, 1.0, 'Rogers Park', 7.7, 23.6, 8.7, 18.2, 27.5, 23939, 39.0),\n",
       " (1, 2.0, 'West Ridge', 7.8, 17.2, 8.8, 20.8, 38.5, 23040, 46.0),\n",
       " (2, 3.0, 'Uptown', 3.8, 24.0, 8.9, 11.8, 22.2, 35787, 20.0),\n",
       " (3, 4.0, 'Lincoln Square', 3.4, 10.9, 8.2, 13.4, 25.5, 37524, 17.0),\n",
       " (4, 5.0, 'North Center', 0.3, 7.5, 5.2, 4.5, 26.2, 57123, 6.0)]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT * FROM chicago_socioeconomic_data limit 5;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problems\n",
    "\n",
    "### Problem 1\n",
    "\n",
    "##### How many rows are in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>78</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(Decimal('78'),)]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql select count(*) from chicago_socioeconomic_data;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "source": [
    "Double-click __here__ for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data;\n",
    "\n",
    "Correct answer: 78\n",
    "\n",
    "-->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(chicago_socioeconomic_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'tuple' object is not callable",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-8-2dfc59934a45>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mchicago_socioeconomic_data\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: 'tuple' object is not callable"
     ]
    }
   ],
   "source": [
    "chicago_socioeconomic_data.shape()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(78, 9)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chicago_socioeconomic_data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=78, step=1)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chicago_socioeconomic_data.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['ca', 'community_area_name', 'percent_of_housing_crowded',\n",
       "       'percent_households_below_poverty', 'percent_aged_16_unemployed',\n",
       "       'percent_aged_25_without_high_school_diploma',\n",
       "       'percent_aged_under_18_or_over_64', 'per_capita_income_',\n",
       "       'hardship_index'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chicago_socioeconomic_data.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 2\n",
    "\n",
    "##### How many community areas in Chicago have a hardship index greater than 50.0?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "ename": "IndentationError",
     "evalue": "unexpected indent (<ipython-input-12-46e9d2ec6c99>, line 2)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  File \u001b[0;32m\"<ipython-input-12-46e9d2ec6c99>\"\u001b[0;36m, line \u001b[0;32m2\u001b[0m\n\u001b[0;31m    where hardship_index > 50.0;\u001b[0m\n\u001b[0m    ^\u001b[0m\n\u001b[0;31mIndentationError\u001b[0m\u001b[0;31m:\u001b[0m unexpected indent\n"
     ]
    }
   ],
   "source": [
    "%sql select count(*) from chicago_socioeconomic_data\n",
    "     where hardship_index > 50.0;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n",
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "(ibm_db_dbi.ProgrammingError) ibm_db_dbi::ProgrammingError: Statement Execute Failed: [IBM][CLI Driver][DB2/LINUXX8664] SQL0104N  An unexpected token \"where\" was found following \"BEGIN-OF-STATEMENT\".  Expected tokens may include:  \"<values>\".  SQLSTATE=42601 SQLCODE=-104\n",
      "[SQL: where hardship_index > 50.0;]\n",
      "(Background on this error at: http://sqlalche.me/e/f405)\n"
     ]
    }
   ],
   "source": [
    "%sql select count(*) from chicago_socioeconomic_data\n",
    "%sql where hardship_index > 50.0;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>38</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(Decimal('38'),)]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql select count(*) from chicago_socioeconomic_data \\\n",
    "     where hardship_index > 50.0;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>38</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(Decimal('38'),)]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql select count(*) from chicago_socioeconomic_data where hardship_index > 50.0;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "source": [
    "Double-click __here__ for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "%sql SELECT COUNT(*) FROM chicago_socioeconomic_data WHERE hardship_index > 50.0;\n",
    "Correct answer: 38\n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 3\n",
    "\n",
    "##### What is the maximum value of hardship index in this dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>98.0</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(98.0,)]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql select max(hardship_index) from chicago_socioeconomic_data;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "source": [
    "Double-click __here__ for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "%sql SELECT MAX(hardship_index) FROM chicago_socioeconomic_data;\n",
    "\n",
    "Correct answer: 98.0\n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 4\n",
    "\n",
    "##### Which community area which has the highest hardship index?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>community_area_name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Riverdale</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[('Riverdale',)]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql select community_area_name \\\n",
    "     from chicago_socioeconomic_data \\\n",
    "     where hardship_index = 98.0;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Double-click __here__ for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "## We can use the result of the last query to as an input to this query:\n",
    "%sql SELECT community_area_name FROM chicago_socioeconomic_data where hardship_index=98.0\n",
    "\n",
    "## or another option:\n",
    "%sql SELECT community_area_name FROM chicago_socioeconomic_data ORDER BY hardship_index DESC NULLS LAST FETCH FIRST ROW ONLY;\n",
    "\n",
    "## or you can use a sub-query to determine the max hardship index:\n",
    "%sql select community_area_name from chicago_socioeconomic_data where hardship_index = ( select max(hardship_index) from chicago_socioeconomic_data ) \n",
    "\n",
    "Correct answer: 'Riverdale'\n",
    "-->"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 5\n",
    "\n",
    "##### Which Chicago community areas have per-capita incomes greater than $60,000?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>community_area_name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Lake View</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Lincoln Park</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Near North Side</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Loop</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[('Lake View',), ('Lincoln Park',), ('Near North Side',), ('Loop',)]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql select community_area_name \\\n",
    "     from chicago_socioeconomic_data \\\n",
    "     where per_capita_income_ > 60000;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Double-click __here__ for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "\n",
    "%sql SELECT community_area_name FROM chicago_socioeconomic_data WHERE per_capita_income_ > 60000;\n",
    "\n",
    "Correct answer:Lake View,Lincoln Park, Near North Side, Loop\n",
    "-->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Problem 6\n",
    "\n",
    "##### Create a scatter plot using the variables `per_capita_income_` and `hardship_index`. Explain the correlation between the two variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * ibm_db_sa://rzb85234:***@dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net:50000/BLUDB\n",
      "Done.\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/scipy/stats/stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.\n",
      "  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAasAAAGoCAYAAAD4hcrDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3df7icZXng8e9NcoQTRA5RpHAoBbo0VsUSTf3RuK4FbbBVSVEEXVvW2otrt66KbaNha9faXpZY29pfW1sWrdSf+AMDiopuIrqlV9FgUEDMSoUiBwpBOZRCxAPc+8f7TpiczJwzc87MvO/MfD/XNdfMvDPzzp1zkrnz3HM/zxOZiSRJdXZA1QFIkrQYk5UkqfZMVpKk2jNZSZJqz2QlSaq9lVUH0AO2M0oaJVF1AHXkyEqSVHsmK0lS7Y1CGbAWPnz1rT0936uedUxPzydJw8yRlSSp9kxWkqTaM1lJkmrPZCVJqj2TlSSp9kxWkqTaM1lJkmrPZCVJqj2TlSSp9kxWkqTaM1lJkmrPZCVJqr3IHPrtoJb0B+j1wrN158K40tBwP6sWHFlJkmrPZCVJqj2TlSSp9kxWkqTaM1lJkmrPZCVJqj2TlSSp9kxWkqTaM1lJkmrPZCVJqj2TlSSp9kxWkqTaM1lJkmrPZCVJqj2TlSSp9kxWkqTaM1lJkmpvZdUBaDB6vTOyOw9LGiRHVpKk2jNZSZJqz2QlSao9k5UkqfZssNCS2LAhaZAcWUmSas9kJUmqPZOVJKn2TFaSpNqzwUK10OuGDbBpQxoljqwkSbVnspIk1Z5lQI0s54JJo8ORlSSp9hxZSRXpR1NJLzmSVJ04spIk1Z7JSpJUe5YBJQ1E3Rte6h7fuHNkJUmqvcjMqmNYloj4PPCEHp/2CcDdPT7nUtQlDjCWdoylNWPZX6dx3J2Zp/Y7mGEz9MmqHyJiR2auM45HGUtrxtKasdQ3jmFlGVCSVHsmK0lS7ZmsWrug6gBKdYkDjKUdY2nNWPZXlziGkt9ZSZJqz5GVJKn2TFaSpNozWUmSas9kJUmqPZOVJKn2hj5ZnXrqqQl48eLFy6hcOjain38tDX2yuvvuOiz5JUmDN06ff0OfrCRJo89kJUmqPZOVJKn2+pqsIuJ9EXFXRFzfdGx1RHwxIr5TXh/W9Nh5EXFTROyKiA39jE2SNDz6PbJ6PzB/E7HNwLbMPAHYVt4nIp4MnAU8pXzNX0fEij7HJ0kaAn1NVpn5FeAH8w6fBlxU3r4I2Nh0/KOZ+WBm3gzcBDyzn/FJkoZDFd9ZHZGZdwCU108sj08D32t63m3lMUnSmKtTg0W0ONZyglhEnBMROyJix+7du/scliTVx7h+/lWRrO6MiCMByuu7yuO3AT/e9LyjgdtbnSAzL8jMdZm57vDDD+9rsJJUJ+P6+VdFsroMOLu8fTZwadPxsyLiwIg4DjgB+GoF8UmSamZlP08eER8Bng88ISJuA94GbAE+FhGvBW4FzgDIzBsi4mPAt4CHgNdl5sO9jmnrzhnedcUubp/dw1FTk2zasIaNa/1qTJLqrK/JKjNf2eahU9o8/x3AO/oVz9adM5x3yXXsmSty4MzsHs675DoAE5Yk1VidGiz67l1X7NqbqBr2zD3Mu67YVVFEkqRO9HVkVTe3z+5Z9LhlQknD4gf3/4gPX31r1WEs26uedcyizxmrkdVRU5MLHm+UCWdm95A8WibcunNmgFFKkuYbq2S1acMaJif2XcFpcmIFmzasASwTSlJdjVUZsFHOa1fmm2lTJmx3XJI0GGOVrKBIWO2+g1oRwcO5/6IZK6LV4hqSpEEZqzLgYlolqoWOS5IGw2TVZLpNA0a745KkwTBZNVmsAUOSVI2x+85qIe0aMADWb9nu3CtJqojJap75DRgu0SRJ1bMMuAjnXklS9UxWi2i3RJNzryRpcExWi2i3RFOAyzBJ0oCYrBaxacMaWk0JTrAUKEkDYoNFG82rr7ebEtyuRChJ6i2TVQvzOwDbaVcilCT1lmXAFlp1AM7nZGFJGhyTVQuLlfdWROxtX7fJQpL6z2TVQrvy3tTkBJMTK/YubOvmjJI0GCarFtqtERiBE4QlqQImqxY2rp3m/NNPZHpqkqBYdf38009k9oG5ls+3K1CS+stuwDZabdL4rit2tVy5wq5ASeovR1ZdcAsRSaqGyWoBW3fOsH7Ldo7bfDnrt2wHaFkedPV1Seovy4BttNsa5PzTT+SqzSdXHJ0kjRdHVm24NYgk1YcjqzbadfjNP968hqC7CEtSfziyaqNdh1/z8UapcKZc7NZJwpLUHyarNjrp/LNUKEmDYbJqY/7E4MNWTXDgygN408XXsn7LdrbunOm4VChJWh6T1QI2rp3mqs0n8+4zT+KHc48wu2dun3Lf1KqJlq9zkrAk9ZbJqgPtyn2ZOElYkgbAZNWBdmW9e/fMOUlYkgbA1vUOHDU12XZNwFZrCEqSesuRVQdcE1CSquXIqgONkVOryb9OCpak/jNZdahVua/d+oGN50uSesMy4DI4KViSBsOR1RI0Sn+tmi7AScGS1Gsmqy7NL/214qRgSeoty4BdalX6a2aXoCT1niOrLi1U4pu2G1CS+sJk1aWpVRPc88DcfscPWzXhDsKS1CeWAbuU2d1xSdLyObLq0r179h9VLXTcScOStHyOrLrUyQ7CDe4kLEm9YbLqUjfrBDppWJJ6wzJglxZaJ3A+dxKWpN4wWS1Bp9uCLLS1iCSpc5WVASPiTRFxQ0RcHxEfiYiDImJ1RHwxIr5TXh9WVXyL2bpzhvVbtnPc5stZv2V7y++h3FpEknqjkmQVEdPAG4B1mflUYAVwFrAZ2JaZJwDbyvu102njxMa10+4kLEk9UGUZcCUwGRFzwCrgduA84Pnl4xcBVwJvqSK4hSzUODE/EbmTsCQtXyUjq8ycAf4YuBW4A7g3M78AHJGZd5TPuQN4YqvXR8Q5EbEjInbs3r17UGHvLf252rqkqjR//t03+4OqwxmYqsqAhwGnAccBRwEHR8SrO319Zl6Qmesyc93hhx/erzD30Vz6a8fGCUn91vz5d8jU6qrDGZiqGixeANycmbszcw64BPg54M6IOBKgvL6rovj242rrklSdqr6zuhV4dkSsAvYApwA7gPuBs4Et5fWlFcW3n0Gstu7STJLUWiXJKjOvjohPAF8HHgJ2AhcAjwU+FhGvpUhoZ1QRXyvt5kxNT032ZLX1+Zs6NjoMAROWpLFXWTdgZr4NeNu8ww9SjLJqZ9OGNfvtENzL0l83HYaSNOxe9axjunq+K1h0qJtllpbCpZkkqT2TVRf6OWfKpZkkqT1XXa8Jl2aSpPYcWQ1AJ11+/S4zStIwM1n1WTddfi7NJEmtWQbsMzdglKTlM1n1mV1+krR8Jqs+O3RyoqvjkqT9maz6LKK745Kk/dlg0SeNDsB7Hphr+fhsm+OSpP2ZrPpgfgdgK072laTOWQbsA7cTkaTecmTVB4PYTkSSxokjqz5oV+ILMFFJ0hKYrPpg04Y1tGr2S3AysCQtgcmqDzaunSbbPOZkYEnqnsmqT6bblALtApSk7pms+sQtPySpd+wG7LHm7UAOnZzgoIkDmH1gzi0/JGkZTFY9NH8y8OyeOSYnVvDuM08ySUnSMlgG7CG3A5Gk/jBZ9VC7Tr+Z2T1s3Tkz4GgkaXSYrHpooU6/8y65zoQlSUtksuqhVh2ADZYDJWnpbLDooUYTxbkXX9vy8RknBEvSkjiy6rGNa6fbTggOsBQoSUtgsuoD1waUpN4yWfWBawNKUm+ZrPrEtQElqXdMVn3i2oCS1Dt2A/ZJozPw7Z++gXsemAPgwJX+30CSlsJPzz774dwje2/P7plzcrAkLYHJqo9cK1CSesNk1UcLrRV43ObLWb9lu6MsSeqAyaqPFur8S4qkZVlQkhZnsuqjhdYKbLAsKEmLM1n10NadM6zfsn1viQ/g/NNPZHpqsuWKFg1OFJakhZmseqSxS/DM7J59SnwAV20+mZu3/JIThSVpiUxWPdJJ59/PP+nw/UZYThSWpMWZrHqkXSmvcXzrzhk+ec3MPmsGBvCyZ0zvnUAsSWrNZNUj7Up5jeOtRl4JfOnbu/sdmiQNPZNVjyy2FuBiIy9JUnsmqx7ZuHZ6n86/6alJzj/9xL0lvsVGXpKk9lzItoc2rm3//dOmDWs475Lr9ikF2lwhSZ0xWQ3A1p0ze7+zWhHBw5lMT02yacMamyskqQMmqz5rzL9qjKgeztw7ojJRSVJn/M6qz1x5XZKWz2TVZwutvO4CtpLUGZNVny3U7eeK65LUmY6TVUR8ICIObbr/ExGxrT9hjY6FVl63HChJnemmweIfgKsj4jeBaWAT8FtLfeOImAIuBJ5KsZjDrwG7gIuBY4FbgFdk5j1LfY86aDRRnHvxtS0fv70sB77ril3cPruHo+wSlKT9dDyyysy/BX4duBT4feB5mfnpZbz3nwOfz8wnAT8D3AhsBrZl5gnAtvL+0Nu4drrtiutTqyZartZueVCSHtVNGfBXgPcBvwq8H/hsRPzMUt40Ih4HPA94L0Bm/igzZ4HTgIvKp10EbFzK+euo3XJMmdgtKEmL6KbB4mXAczPzI5l5HvBfeTSxdOt4YDfwdxGxMyIujIiDgSMy8w6A8vqJrV4cEedExI6I2LF793AsBNtuOaZ798y1fL5rBkpqpfnz777ZH1QdzsB0/J1VZm6cd/+rEfHMZbzv04HXZ+bVEfHndFHyy8wLgAsA1q1bl4s8vTZaLcf0rit2MdMiMblmoKRWmj//jv/ppw3N599ydVMG/KmI2BYR15f3nwa8eYnvextwW2ZeXd7/BEXyujMijizPfyRw1xLPPzQWW61dktRdGfB/A+cBcwCZ+U3grKW8aWb+K/C9iGh8Ip8CfAu4DDi7PHY2RTPHSNu4dpqXPWOaFVHsIbwiwg0ZJWmeblrXV5Wlv+ZjDy3jvV8PfCgiHgN8F3gNRfL8WES8FrgVOGMZ5x8KjR2EH85iNP9wJp+8ZoZ1P7HahCVJpW6S1d0R8ZMUc6KIiJcDdyz1jTPzWmBdi4dOWeo5h9FCawearCSp0E2yeh3Fl3pPiogZ4Gbg1X2Jaoz0agdhJxZLGmXddAN+F3hB2WJ+QGbe17+wxsdRU5PL7gacvw1JY2IxYMKSNBIWTVbl8kqtjgOQmX/a45jGSi92ELaUKGnUdTKyOqS8XgP8LEXHHsBLgK/0I6hRtVCpbjklvF6VEiWprhZNVpn5doCI+ALw9Eb5LyJ+D/h4X6MbIYuV6pYzAupFKVGS6qybeVbHAD9quv8jitXR1YF+7hjsxGJJo66bbsAPAF+NiE9RtK//MvD3fYlqBHVSqltqR18vSomSVGfddAO+IyI+Dzy3PPSazNzZn7BGz2KluuV29C23lChJddbttvbXUnxP9Sng+xFxTO9DGk2Ller6WSaUpGHX8cgqIl4PvA24E3gYCIpy4NP6E9poWaxUZ0efJLXXzXdWbwTWZOb3+xXMqFuoVGdHnyS1100Z8HvAvf0KZNzZ0SdJ7XUzsvoucGVEXA482DjoCha9sViZ0LX/JI2zbpLVreXlMeVFPdauTOjaf5LGXTet62/vZyBqz7X/JI27Thay/bPMPDciPk25l1WzzHxpXyIbcd2U9ewUlDTuOhlZfaC8/uN+BjJOui3r2Skoadwt2g2YmdeU119udWk8LyI+2c9AR0m3E4DtFJQ07rppsFjM8T0810jrtKzXXCqcWjXBgSsP4N49c3YDSho7vUxW+32fpdY6KevNLxXe88AckxMrePeZJ5mkJI2dbtcGVA90UtZzrUBJelQvR1bRw3ONtE629GhXKpyZ3cP6LdudHCxprHSVrCLiMcCTKEp+uzKzeTPGt/QysFG32JYe7UqFAXuPOzlY0rjouAwYEb8E/DPwF8BfATdFxIsaj2fmF3of3vhqVSpsLHPfzNKgpHHQzcjqT4Cfz8ybACLiJ4HLgc/1I7Bx16pU2GqkBU4OljT6uklWdzUSVem7wF09jkdN5pcK12/Z7uRgSWOpm2R1Q0R8FvgYRTXqDOBrEXE6QGZe0of4xs5CyzBt2rBmn3Z2cHKwpPHQTbI6iGKX4P9U3t8NrAZeQpG8TFbLtNgyTJ10EUrSKOpm1fXX9DMQdba6+mJdhJI0ijpZdf3NmflHEfGXtF51/Q19iWwMLXd1dTdolDSqOhlZ3Vhe7+hnIFre6upu0ChplC2arDLz0+X1Rf0PZ7wtp4HCDRoljbKOv7OKiJ8Cfhs4tvl1mXly78MaT0tpoGiU/pyDJWmUddMN+HHgb4ALgYcXea6WqJsGivmlv1acgyVpFHSTrB7KzPf0LRJ1rVXpr5lzsCSNik66AVeXNz8dEb8BfAp4sPF4Zv6gT7FpEQuV+KbtBpQ0QjoZWV1D0bLe2AJkU9NjiTsEV6Zd9+D01CRXbfarREmjo5NuwOMGEYi65/JLksZFN1uEnBERh5S33xoRl0TE2v6Fpk4cNPHor3BqcoLzTz/R0p+kkdPNtva/m5n3RcRzgQ3ARRTdgapAoxPwngfm9h578KFHKoxIkvqnm2TVqDX9EvCezLwUeEzvQ1InFpoELEmjpptkNRMRfwu8AvhsRBzY5evVQ8tdR1CShkk3yeYVwBXAqZk5S7E9yKaFX6J+OXRyoqvjkjTMuplnBXBl07EHcXHbykR0d1yShlm386yOAe4pb08BtwK2tvfBYtt9zDY1VjRrd1yShtmiZcDMPC4zj6coAb4kM5+QmY8HXoy7A/dFo9NvZnYPyaPbfWzdObP3Oe3W/HMtQEmjqJvvrH42Mz/buJOZn+PRLe7VQ510+m3asIbJiRX7PMcJwZJGVTcL2d4dEW8FPkhRFnw18P2+RDXmOun0W2g7EXcMljRquklWrwTeRrGQbQJfKY+pxzrdMbjVdiLuGCxpFHVUBoyIFcB5mfnGzFybmU/PzHNdcb0/llPic7KwpFHUUbLKzIeBZ/T6zSNiRUTsjIjPlPdXR8QXI+I75fVhvX7PYbBx7TTnn34i01OTBMUq6p2u+edkYUmjqJsy4M6IuIxix+D7GwczczkdgW8EbgQeV97fDGzLzC0Rsbm8/5ZlnH9odbNjcLNOS4iSNEy66QZcTdFQcTLwkvLy4qW+cUQcTbHO4IVNh0+jWCCX8nrjUs8/ruwSlDSKOh5ZZeZrevzefwa8GTik6dgRmXlH+X53RMQTW70wIs4BzgE45phjehzW8Gp0Ae6Ze5gVETyc6Y7B0ohp/vx7wo+Nz7/rjpNVRBwEvBZ4CnBQ43hm/lq3bxoRLwbuysxrIuL53b4+My8ALgBYt25ddvv6UTS/C/DhzL0jKhOVNDqaP/+O/+mnjc3nXzdlwA8AP0axl9WXgaOB+5b4vuuBl0bELcBHgZMj4oPAnRFxJEB5fdcSzz927AKUNMq6SVb/ITN/F7g/My+i+L7pxKW8aWael5lHZ+axwFnA9sx8NXAZcHb5tLOBS5dy/nHUqy7ArTtnWL9lO8dtvpz1W7bvs8STJFWlm2TVWCF1NiKeChwKHNvjeLYAL4yI7wAvLO+rA71YK7CTNQklqQrdJKsLynlPb6UYAX0LeOdyA8jMKzPzxeXt72fmKZl5QnntpOMO9aIL0FKipLrqZp7VB4CXUYymGu3lR/Q6IC3dQRMH7E02U5MT/N5Ln9JVc4UTiiXVVTfJ6lLgXor9rR7sTzhaivmdgAAPPvRI1+dxQrGkuuomWR2dmaf2LRIt2ULlu25GVps2rNkv6TmhWFIddPOd1T9GxJK6/9RfvSrfLWdNQknqp0VHVhFxHcWWICuB10TEdynKgAFkZj6tvyFqMb0s3y11TUJJ6qdOyoBLXv9Pg2H5TtKoWzRZZea/DCIQLd38XYOnVk2QCW+6+FredcUul1ySNPS6+c5KNbZx7TRXbT6Zd595Ej+ce4TZPXNO7JU0MkxWI8aJvZJGUTet6xoC3XQGNrYUuX12D0e5lYikGnNkNWI6XSPQdQAlDROT1YjpdI1Ay4WShollwBEzvzNwfnmvUfprNS8LXAdQUj2ZrEZQu4m9rdYQnM91ACXVkWXAMdKq9NfMicSS6sqR1RhZqMQ3bTegpBozWY2RQycnmN0zt9/xqckJrtp8cgURSVJnLAOOkYjujktSXZisxsjsA/uPqhY6Lkl1YbIaI51OGJakujFZjZFWE4YBHvjRQ65cIanWTFZjpLET8NTkxD7H73lgzqWWJNWayWrMbFw7zcEH7t8E6lJLkurM1vUR1m5V9W5WZpekOjBZjaj5Sys1VlWHoqGi1dqANlpIqivLgCNqoVXVO12ZXZLqwmQ1otqV9GZm9/Cmi6/lwJUHcNiqCYJiqaXzTz/RpZYk1ZZlwBHVrtQHkMDsnjkmJ1bw7jNPMklJqj1HViOq3ZyqZnYAShoWjqxG1PxNGLPN8+wAlDQMTFYjrHkTxvVbttsBKGloWQYcE3YAShpmjqzGxPyy4FEVbrbYbrKyJLVjshojzWXBqiw0Wbnq2CTVl2VADdRCk5UlqR1HVmNu0CU51yWUtBSOrMZYoyQ3U7a2N0py/dwqxA0gJS2FyWqMVVGSsytR0lJYBhwTrcp9VZTk6tSVKGl4mKzGQLsOvEMnJ5jdM7ff8/tdkqtDV6Kk4WIZcAy0K/dFYElO0lAwWY2BdmW92QfmOP/0E5memlzSViFbd86wfst2jtt8Oeu3bO9rY4ak8WYZcAwstDPwUktyTu6VNEiOrMZAPzrwnNwraZAcWY2BfnTgOblX0iCZrMZErzvwFiotSlKvWQbUkji5V9Igmay0JBvXTu/TSXjYqgkOXHkAb7r4WjsDJfWcyUpLtnHtNFdtPpl3n3kSP5x7hNk9cwNbY1DSeDFZadnsDJTUb5U0WETEjwN/D/wY8AhwQWb+eUSsBi4GjgVuAV6RmfdUEaM6101noLsES1qKqkZWDwG/lZk/DTwbeF1EPBnYDGzLzBOAbeV91Vyn235UsSWJpNFQSbLKzDsy8+vl7fuAG4Fp4DTgovJpFwEbq4hP3em0M9ByoaSGD199695LJyr/zioijgXWAlcDR2TmHVAkNOCJbV5zTkTsiIgdu3fvHlSoaqO5MxBgRcTeJNQ8anIisbR8zZ9/983+oOpwBqbSZBURjwU+CZybmf/W6esy84LMXJeZ6w4//PD+BaiObVw7vXeE9XAmsH+Zz12CpeVr/vw7ZGp11eEMTGXJKiImKBLVhzLzkvLwnRFxZPn4kcBdVcWn7i1W5nMisaSlqiRZRUQA7wVuzMw/bXroMuDs8vbZwKWDjk1Lt1iZb/5E4m63JJE0vqpaG3A98CvAdRFxbXnsfwBbgI9FxGuBW4EzKopPS9DJeoHuEixpKSpJVpn5D0C0efiUQcai3tm0Yc0+e1yBZT5JveGq6+qZ5W5F4oRhSe2YrNRT7jwsqR8qn2clgROGJS3MkZX6qtPSnhOGJS3EkZX6ppu1AJ0wLGkhJiv1TTelvUFMGN66c4b1W7Zz3ObL3SBSGjKWAdU33ZT2lttJuBgbOKThZrJS33QySbhZPycMLzTKM1lJ9WcZUH3Ti9Jer0p3NnBIw82RlfqmF5OEe1W663aUJ6leTFbqq+WU9npZunMpKGm4maxUW70s3fW7gUNSf5msVFu9Lt254rs0vGywUG25WaOkBpOVaquxWePU5MTeYwdN+FdWGkf+y1ftPfjQI3tv3/PAXNslmySNLpOVas3V2CWBDRaquV51BLqxozTcHFmp1nqxGns3q79LqieTlWqtFx2BlhKl4WcZULW21Mm8zWW/bPMc1wWUhofJSrXX7WTe+WsKtuO6gNLwsAyokdOq7Defk4ul4eLISiNnofJegN2A0hAyWWnktFtTcHpqkqs2n1xBRJKWyzKgRo5rCkqjx5GVhtJCk3zdDkQaPSYrDZ1OdhB2OxBptFgG1NBxkq80fhxZaej0cgfhYeVahxo3jqw0dHqxXuAwc61DjSOTlYbOuHf7WQbVOLIMqKEzLt1+7Up9lkE1jkxWGkqj3u23UMdju0nP41IG1XiyDCjV0EKlvnEvg2o8maykGlqo1Ldx7TTnn34i01OTBHDYqgkOXHkAb7r4WtZv2W6jhUaSyUqqocU6HjeuneaqzSfz7jNP4odzjzC7Z87OQI00k5VUQ52W+uwM1LiwwUKqoU47Hu0M1LgwWUk11UnHo52BGheWAaUhZmegxoUjK2mI9XqCtGsOqq5MVtKQ69UE6U62XpGqYhlQEmBnoerNkZUkoH+dhZYW1QuOrCQB/dl6xe1M1CsmK0lAfzoLLS2qVywDSgL6s/XKOE5atuzZHyYrSXv1euuVcZu0bEdl/9SuDBgRp0bEroi4KSI2Vx2PpKUbt0nLlj37p1Yjq4hYAfwv4IXAbcDXIuKyzPxWtZFJWopx2dW5YdBlz9UHP4ZXPeuYvpy7bmqVrIBnAjdl5ncBIuKjwGmAyUoaUqO+q3OzcSt7DlLdyoDTwPea7t9WHpOk2hu3sucg1W1kFS2O5X5PijgHOAfgmGPGYwgsqf4GUfYc18+/yNwvF1QmIp4D/F5mbijvnweQmee3e826detyx44dA4pQkvqu1X/aWxrRz7+Wf/66lQG/BpwQEcdFxGOAs4DLKo5JklSxWpUBM/OhiPjvwBXACuB9mXlDxWFJkipWq2QFkJmfBT5bdRySpPqoWxlQkqT9mKwkSbVnspIk1Z7JSpJUeyYrSVLtmawkSbVXqxUsliIidgP/0uPTPgG4u8fnXIq6xAHG0o6xtGYs++s0jrsz89ROThgRn+/0ucNu6JNVP0TEjsxcZxyPMpbWjKU1Y6lvHMPKMqAkqfZMVpKk2jNZtXZB1QGU6hIHGEs7xtKaseyvLnEMJb+zkiTVniMrSVLtmawkSbU3sskqIt4XEXdFxPVNx1ZHxBcj4jvl9WFNj50XETdFxK6I2NB0/BkRcV352F9ERJTHD4yIi8vjV0fEsW3i+PGI+FJE3BgRN0TEGyuM5aCI+GpEfKOM5e1VxdJ0nhURsTMiPlNlLBFxS3mOayNiR8WxTEXEJyLi2+Xfm+dU9PdlTfnzaFz+LSLOrSiWN5V/Z6+PiEKKSREAAAf1SURBVI+Uf5er+v28sYzjhog4tzxW2b+hsZGZI3kBngc8Hbi+6dgfAZvL25uBd5a3nwx8AzgQOA74Z2BF+dhXgedQbLX8OeBF5fHfAP6mvH0WcHGbOI4Enl7ePgT4f+X7VRFLAI8tb08AVwPPriKWpph+E/gw8Jmqfkfl47cAT5h3rKpYLgJ+vbz9GGCqyt9R+bwVwL8CPzHoWIBp4GZgsrz/MeC/VPEzAZ4KXA+sotgP8P8AJ1T9+xmHS+UB9PUPB8eyb7LaBRxZ3j4S2FXePg84r+l5V5R/iY4Evt10/JXA3zY/p7y9kmJmenQQ06XAC6uOpfzH9nXgWVXFAhwNbANO5tFkVVUst7B/shp4LMDjKD6Yo+pY5r3/LwBXVRELRbL6HrC6fM5nyniq+P2cAVzYdP93gTdX/fsZh8vIlgHbOCIz7wAor59YHm/8Y2i4rTw2Xd6ef3yf12TmQ8C9wOMXevNyOL+WYkRTSSxRlN2uBe4CvpiZlcUC/BnFP/RHmo5VFUsCX4iIayLinApjOR7YDfxdFOXRCyPi4IpiaXYW8JHy9kBjycwZ4I+BW4E7gHsz8wuDjqN0PfC8iHh8RKwCfhH48YpiGSvjlqzaiRbHcoHjC72m9RtEPBb4JHBuZv5bVbFk5sOZeRLFqOaZEfHUKmKJiBcDd2XmNQu8/0BiKa3PzKcDLwJeFxHPqyiWlRTl6/dk5lrgfoqyUhWxFE+OeAzwUuDjC8TRt1jK739OoyijHQUcHBGvHnQcAJl5I/BO4IvA5ylKfA9VEcu4GbdkdWdEHAlQXt9VHr+N4n9HDUcDt5fHj25xfJ/XRMRK4FDgB63eNCImKBLVhzLzkipjacjMWeBK4NSKYlkPvDQibgE+CpwcER+sKBYy8/by+i7gU8AzK4rlNuC2csQL8AmK5FXl35cXAV/PzDvL+4OO5QXAzZm5OzPngEuAn6sgDgAy872Z+fTMfF75nO9UFcs4GbdkdRlwdnn7bIrvjxrHzyq7cI6j+ML0q+Vw/r6IeHbZqfOr817TONfLge1ZFpmbla97L3BjZv5pxbEcHhFT5e1Jig+Bb1cRS2ael5lHZ+axFCWm7Zn56op+LgdHxCGN2xTfh1xf0c/lX4HvRcSa8tApwLeqiKXJK3m0BDj/9YOI5Vbg2RGxqnz9KcCNVf1MIuKJ5fUxwOnlz6bK3894qPpLs35dKP4C3QHMUfxP5bUUdd9tFP8T2gasbnr+71B06uyi7Mopj6+j+OD6Z+CveHTVj4MoyiI3UXT1HN8mjudSDOG/CVxbXn6xolieBuwsY7ke+J/l8YHHMi+u5/Nog0UVP5fjKco53wBuAH6nyp8LcBKwo/w9bQUOqzCWVcD3gUObjlXxO3o7xX+srgc+QNFdV9XP5P9S/AfiG8Apdfg3NA4Xl1uSJNXeuJUBJUlDyGQlSao9k5UkqfZMVpKk2jNZSZJqz2QlSao9k5XGXkT8fkS8oLx9brnm21LOc2FEPLm30UkCnGel+ouIlVks6DmI97oFWJeZdw/i/SR1xpGVBiIijo1iM8GLIuKbUWwuuCqKDei+XK52fkXT+mpXRsQfRsSXgTe2OecREfGpKDaT/EZE/Fx5fGt5vhvi0RXUiYh/j4g/iYivR8S2iDi8PP7+iHh5RLyBYqHUL0XEl8rH3hMRO6Jps8oF/oxXRsS6pvd6RxnXP0XEEYvE/JtRbOh3fTy6oV/jZ3ZhefxDEfGCiLgqik3+nlk+7+AoNhv9WhQrtZ+2jF+VVE9VL6HhZTwuFHuLJcXq5gDvAzYB/wgcXh47E3hfeftK4K8XOefFFKvYQ7E54KHl7dXl9STFcjaPL+8n8J/L2/8T+Kvy9vuBl5e3b6FpX6umc60oY3raAvFcSTEqa7zXS8rbfwS8tV3MwDOA64CDgcdSLPm0tvyZPQScSPEfy2vKn1tQrEK+tTzPHwKvLm9PUWzweXDVv3MvXnp5cWSlQfpeZl5V3v4gsIFi59UvRrHH1lvZdyXqixc538nAe2Dv1if3lsffEBHfAP6JYvXqE8rjjzSd84MU6zYu5hUR8XWKNRWfQrHzayd+RLFJIBRJ5tgFYn4u8KnMvD8z/51iVfH/WD7/5sy8LjMfoUhi2zIzKZJb45y/AGwuf4ZXUqwtd0yHcUpDYWXVAWiszP+C9D7ghsx8Tpvn39/tG0TE8ylWk39OZj4QEVdSfHh3Es/8cx0H/Dbws5l5T0S8f4FzzTdXJhWAh1n431qr/YsaHmy6/UjT/UeazhnAyzJzV4exSUPHkZUG6ZiIaCSmV1KMfA5vHIuIiYh4Shfn2wb8t/K1KyLicRRltXvKRPUk4NlNzz+AYssFgFcB/9DinPcBh5S3H0eRMO8tv3N6URexdRPzV4CN5Xd4BwO/TLGyd6euAF5fbjVBRKztQZxSrZisNEg3AmdHxDeB1cBfUiSPd5Zlu2spNtXr1BuBn4+I6yhKbU+h2L11Zfkef0CREBvuB54SEddQlON+v8U5LwA+FxFfysxvUJT/bqD4ruiqFs/v1n4xZ+bXKb43+ypwNXBhZu7s4px/AEwA34yI68v70kixdV0DERHHUuxZ9dQKY/j3zHxsVe8vaekcWUmSas+RlWovIn4HOGPe4Y9n5jsqiudTwHHzDr8lM6+oIp5WImID8M55h2/OzF+uIh5puUxWkqTaswwoSao9k5UkqfZMVpKk2jNZSZJq7/8DN3bBTyJgMUkAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x432 with 3 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import seaborn as sns\n",
    "\n",
    "income_vs_hardship = %sql select per_capita_income_, hardship_index from chicago_socioeconomic_data;\n",
    "plot = sns.jointplot(x='per_capita_income_', y='hardship_index', data=income_vs_hardship.DataFrame())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "source": [
    "Double-click __here__ for the solution.\n",
    "\n",
    "<!-- Hint:\n",
    "# if the import command gives ModuleNotFoundError: No module named 'seaborn'\n",
    "# then uncomment the following line i.e. delete the # to install the seaborn package \n",
    "# !pip install seaborn\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "import seaborn as sns\n",
    "\n",
    "income_vs_hardship = %sql SELECT per_capita_income_, hardship_index FROM chicago_socioeconomic_data;\n",
    "plot = sns.jointplot(x='per_capita_income_',y='hardship_index', data=income_vs_hardship.DataFrame())\n",
    "\n",
    "Correct answer:You can see that as Per Capita Income rises as the Hardship Index decreases. We see that the points on the scatter plot are somewhat closer to a straight line in the negative direction, so we have a negative correlation between the two variables. \n",
    "-->\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[0;31mSignature:\u001b[0m\n",
       "\u001b[0msns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjointplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mdata\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mkind\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'scatter'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mstat_func\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mcolor\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mheight\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m6\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mratio\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mspace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0.2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mdropna\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mxlim\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mylim\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mjoint_kws\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mmarginal_kws\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0mannot_kws\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m    \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\n",
       "\u001b[0;34m\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
       "\u001b[0;31mDocstring:\u001b[0m\n",
       "Draw a plot of two variables with bivariate and univariate graphs.\n",
       "\n",
       "This function provides a convenient interface to the :class:`JointGrid`\n",
       "class, with several canned plot kinds. This is intended to be a fairly\n",
       "lightweight wrapper; if you need more flexibility, you should use\n",
       ":class:`JointGrid` directly.\n",
       "\n",
       "Parameters\n",
       "----------\n",
       "x, y : strings or vectors\n",
       "    Data or names of variables in ``data``.\n",
       "data : DataFrame, optional\n",
       "    DataFrame when ``x`` and ``y`` are variable names.\n",
       "kind : { \"scatter\" | \"reg\" | \"resid\" | \"kde\" | \"hex\" }, optional\n",
       "    Kind of plot to draw.\n",
       "stat_func : callable or None, optional\n",
       "    *Deprecated*\n",
       "color : matplotlib color, optional\n",
       "    Color used for the plot elements.\n",
       "height : numeric, optional\n",
       "    Size of the figure (it will be square).\n",
       "ratio : numeric, optional\n",
       "    Ratio of joint axes height to marginal axes height.\n",
       "space : numeric, optional\n",
       "    Space between the joint and marginal axes\n",
       "dropna : bool, optional\n",
       "    If True, remove observations that are missing from ``x`` and ``y``.\n",
       "{x, y}lim : two-tuples, optional\n",
       "    Axis limits to set before plotting.\n",
       "{joint, marginal, annot}_kws : dicts, optional\n",
       "    Additional keyword arguments for the plot components.\n",
       "kwargs : key, value pairings\n",
       "    Additional keyword arguments are passed to the function used to\n",
       "    draw the plot on the joint Axes, superseding items in the\n",
       "    ``joint_kws`` dictionary.\n",
       "\n",
       "Returns\n",
       "-------\n",
       "grid : :class:`JointGrid`\n",
       "    :class:`JointGrid` object with the plot on it.\n",
       "\n",
       "See Also\n",
       "--------\n",
       "JointGrid : The Grid class used for drawing this plot. Use it directly if\n",
       "            you need more flexibility.\n",
       "\n",
       "Examples\n",
       "--------\n",
       "\n",
       "Draw a scatterplot with marginal histograms:\n",
       "\n",
       ".. plot::\n",
       "    :context: close-figs\n",
       "\n",
       "    >>> import numpy as np, pandas as pd; np.random.seed(0)\n",
       "    >>> import seaborn as sns; sns.set(style=\"white\", color_codes=True)\n",
       "    >>> tips = sns.load_dataset(\"tips\")\n",
       "    >>> g = sns.jointplot(x=\"total_bill\", y=\"tip\", data=tips)\n",
       "\n",
       "Add regression and kernel density fits:\n",
       "\n",
       ".. plot::\n",
       "    :context: close-figs\n",
       "\n",
       "    >>> g = sns.jointplot(\"total_bill\", \"tip\", data=tips, kind=\"reg\")\n",
       "\n",
       "Replace the scatterplot with a joint histogram using hexagonal bins:\n",
       "\n",
       ".. plot::\n",
       "    :context: close-figs\n",
       "\n",
       "    >>> g = sns.jointplot(\"total_bill\", \"tip\", data=tips, kind=\"hex\")\n",
       "\n",
       "Replace the scatterplots and histograms with density estimates and align\n",
       "the marginal Axes tightly with the joint Axes:\n",
       "\n",
       ".. plot::\n",
       "    :context: close-figs\n",
       "\n",
       "    >>> iris = sns.load_dataset(\"iris\")\n",
       "    >>> g = sns.jointplot(\"sepal_width\", \"petal_length\", data=iris,\n",
       "    ...                   kind=\"kde\", space=0, color=\"g\")\n",
       "\n",
       "Draw a scatterplot, then add a joint density estimate:\n",
       "\n",
       ".. plot::\n",
       "    :context: close-figs\n",
       "\n",
       "    >>> g = (sns.jointplot(\"sepal_length\", \"sepal_width\",\n",
       "    ...                    data=iris, color=\"k\")\n",
       "    ...         .plot_joint(sns.kdeplot, zorder=0, n_levels=6))\n",
       "\n",
       "Pass vectors in directly without using Pandas, then name the axes:\n",
       "\n",
       ".. plot::\n",
       "    :context: close-figs\n",
       "\n",
       "    >>> x, y = np.random.randn(2, 300)\n",
       "    >>> g = (sns.jointplot(x, y, kind=\"hex\")\n",
       "    ...         .set_axis_labels(\"x\", \"y\"))\n",
       "\n",
       "Draw a smaller figure with more space devoted to the marginal plots:\n",
       "\n",
       ".. plot::\n",
       "    :context: close-figs\n",
       "\n",
       "    >>> g = sns.jointplot(\"total_bill\", \"tip\", data=tips,\n",
       "    ...                   height=5, ratio=3, color=\"g\")\n",
       "\n",
       "Pass keyword arguments down to the underlying plots:\n",
       "\n",
       ".. plot::\n",
       "    :context: close-figs\n",
       "\n",
       "    >>> g = sns.jointplot(\"petal_length\", \"sepal_length\", data=iris,\n",
       "    ...                   marginal_kws=dict(bins=15, rug=True),\n",
       "    ...                   annot_kws=dict(stat=\"r\"),\n",
       "    ...                   s=40, edgecolor=\"w\", linewidth=1)\n",
       "\u001b[0;31mFile:\u001b[0m      ~/conda/envs/python/lib/python3.6/site-packages/seaborn/axisgrid.py\n",
       "\u001b[0;31mType:\u001b[0m      function\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.jointplot?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "##### Now that you know how to do basic exploratory data analysis using SQL and python visualization tools, you can further explore this dataset to see how the variable `per_capita_income_` is related to `percent_households_below_poverty` and `percent_aged_16_unemployed`. Try to create interesting visualizations!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "##### In this lab you learned how to store a real world data set from the internet in a database (Db2 on IBM Cloud), gain insights into data using SQL queries. You also visualized a portion of the data in the database to see what story it tells."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright &copy; 2018 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python",
   "language": "python",
   "name": "conda-env-python-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
